Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW

نویسندگان

Junwei Yue

Fumiya Shiozawa

Shohei Toyama

Yutaka Yamauchi

Kayoko Ito

Daisuke Saito

Nobuaki Minematsu

چکیده

Shadowing has become a well-known method to improve learners’ overall proficiency. Our previous studies realized automatic scoring of shadowing speech using HMM phoneme posteriors, called GOP (Goodness of Pronunciation) and learners’ TOEIC scores were predicted adequately. In this study, we enhance our studies from multiple angles: 1) a much larger amount of shadowing speech is collected, 2) manual scoring of these utterances is done by two native teachers, 3) DNN posteriors are introduced instead of HMM ones, 4) language-independent shadowing assessment based on posteriors-based DTW (Dynamic Time Warping) is examined. Experiments suggest that, compared to HMM, DNN can improve teacher-machine correlation largely by 0.37 and DTW based on DNN posteriors shows as high correlation as 0.74 even when posterior calculation is done using a different language from the target language of learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Calibration of Phone Likelihoods in Automatic Speech Recognition

In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi’s DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated...

متن کامل

Content Normalization for Text-independent Speaker Verification

In the past few years, Deep Neural Network (DNN) based ivector Speaker Verification (SV) systems have shown to provide state-of-the-art performance. However, error rates increase drastically for short duration recordings. In this paper, we improve the i-vector approach for short utterances, (i) by using smoothed DNN posteriors for i-vector extraction, and (ii) by normalizing the content of the ...

متن کامل

Low-rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance Dnn Acoustic Modeling

We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...

متن کامل

Low-rank Representation for Enhanced Deep Neural Network Acoustic Models

Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DNN) to estimate class-conditional posterior probabilities. The posterior probabilities are used for acoustic modeling in hidden Markov models ...

متن کامل